Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: Reconcile treafik service with canary at 0 #1692

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

joaosilva15
Copy link

Setting the weight to 100 on both services makes 50% of the traffic go to each service. This made our canary enter an infinity loop while promoting a new version and the traefik service go altered.

The traefik service should not be changed as it is managed by flagger but getting stuck in an infinity loop is not great. The loop happened because during promotion with StepWeightPromotion when the traefik service gets reconciled the weights are reset. After that the getroutes makes this
calculus
for the weights which returns 0 for the canary and then it would later not be able to exit
this.

Besides this change do you know why are we treating the weights as percentages? Should I also change the get routes function to calculate the percentage based on the weights or is it coded like that because it is expected that flagger keeps the weights with those constraints?

Setting the weight to 100 on both services makes 50% of the traffic go
to each service. This made our canary enter an infinity loop while
promoting a new version and the traefik service go altered.

The traefik service should not be changed as it is managed by flagger
but getting stuck in an infinity loop is not great. The loop happened
because during promotion with `StepWeightPromotion` when the traefik
service gets reconciled the weights are reset. After that the getroutes
makes [this
calculus](https://github.com/fluxcd/flagger/blob/9a224a0c906354fcfcbc01d4d2df987389301e68/pkg/router/traefik.go#L163-L164)
for the weights which returns 0 for the canary and then it would later
not be able to exit
[this](https://github.com/fluxcd/flagger/blob/v1.36.1/pkg/controller/scheduler.go#L491-L546).

Besides this change do you know why are we treating the weights as
percentages? Should I also change the get routes function to calculate
the percentage based on the weights or it is coded like that because it
is expected that flagger keeps the weights with those constraints?

Signed-off-by: Joao Pedro Silva <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant